Video-and-language pre-training has shown promising results for learning generalizable representations. Most existing approaches usually model video and text in an implicit manner, without considering explicit structural representations of the multi-modal content. We denote such form of representations as structural knowledge, which express rich semantics of multiple granularities. There are related works that propose object-aware approaches to inject similar knowledge as inputs. However, the existing methods usually fail to effectively utilize such knowledge as regularizations to shape a superior cross-modal representation space. To this end, we propose a Cross-modaL knOwledge-enhanced Pre-training (CLOP) method with Knowledge Regularizations. There are two key designs of ours: 1) a simple yet effective Structural Knowledge Prediction (SKP) task to pull together the latent representations of similar videos; and 2) a novel Knowledge-guided sampling approach for Contrastive Learning (KCL) to push apart cross-modal hard negative samples. We evaluate our method on four text-video retrieval tasks and one multi-choice QA task. The experiments show clear improvements, outperforming prior works by a substantial margin. Besides, we provide ablations and insights of how our methods affect the latent representation space, demonstrating the value of incorporating knowledge regularizations into video-and-language pre-training.
translated by 谷歌翻译
数据增强可帮助神经网络通过放大培训集来更好地推广,但它仍然是如何有效增强图数据以增强GNN的性能的开放问题(图形神经网络)。虽然大多数现有图形常规程序专注于通过添加/删除边缘来操纵图形拓扑结构,但我们提供了一种增强节点功能以获得更好性能的方法。我们提出标志(图中的免费大规模对抗动力增强),它在训练期间迭代地增强了基于梯度的对冲扰动的节点特征。通过使模型不变地在输入数据中的小波动中,我们的方法有助于模型推广到分布外的样本,并在测试时间提高模型性能。标志是图形数据的通用方法,它普遍存在节点分类,链路预测和图形分类任务中。标志也是非常灵活和可扩展的,并且可以使用任意GNN骨架和大规模数据集进行可部署。我们通过广泛的实验和消融研究证明了我们方法的功效和稳定性。我们还提供了直观的观察,以更深入地了解我们的方法。
translated by 谷歌翻译
Convolutional Neural Networks (CNNs) achieve impressive performance in a wide variety of fields. Their success benefited from a massive boost when very deep CNN models were able to be reliably trained. Despite their merits, CNNs fail to properly address problems with non-Euclidean data. To overcome this challenge, Graph Convolutional Networks (GCNs) build graphs to represent non-Euclidean data, borrow concepts from CNNs, and apply them in training. GCNs show promising results, but they are usually limited to very shallow models due to the vanishing gradient problem (see Figure 1). As a result, most state-of-the-art GCN models are no deeper than 3 or 4 layers. In this work, we present new ways to successfully train very deep GCNs. We do this by borrowing concepts from CNNs, specifically residual/dense connections and dilated convolutions, and adapting them to GCN architectures. Extensive experiments show the positive effect of these deep GCN frameworks. Finally, we use these new concepts to build a very deep 56-layer GCN, and show how it significantly boosts performance (+3.7% mIoU over state-of-the-art) in the task of point cloud semantic segmentation. We believe that the community can greatly benefit from this work, as it opens up many opportunities for advancing GCN-based research.
translated by 谷歌翻译
Despite the fast advances in high-sigma yield analysis with the help of machine learning techniques in the past decade, one of the main challenges, the curse of dimensionality, which is inevitable when dealing with modern large-scale circuits, remains unsolved. To resolve this challenge, we propose an absolute shrinkage deep kernel learning, ASDK, which automatically identifies the dominant process variation parameters in a nonlinear-correlated deep kernel and acts as a surrogate model to emulate the expensive SPICE simulation. To further improve the yield estimation efficiency, we propose a novel maximization of approximated entropy reduction for an efficient model update, which is also enhanced with parallel batch sampling for parallel computing, making it ready for practical deployment. Experiments on SRAM column circuits demonstrate the superiority of ASDK over the state-of-the-art (SOTA) approaches in terms of accuracy and efficiency with up to 10.3x speedup over SOTA methods.
translated by 谷歌翻译
凝视估计对于许多科学领域和日常应用至关重要,范围从认知心理学的基本研究到注意力吸引人的移动系统。尽管深度学习的最新进展在建立高度准确的凝视估计系统方面取得了巨大的成功,但相关的高计算成本以及对大规模标记的凝视数据的依赖,以实现对现有解决方案实际使用的监督学习地点挑战。为了超越这些局限性,我们提出了FreeGaze,这是一种用于无监督的注视表示学习的资源有效框架。 FreeGaze在其设计中结合了频域目光的估计和对比度注视表示。前者大大减轻了系统校准和凝视估计中的计算负担,并大大减少了系统延迟。尽管后者克服了现有基于学习的同行的数据标记障碍,并确保在没有凝视标签的情况下确保有效的凝视表示学习。我们对两个凝视估计数据集的评估表明,通过现有基于监督的学习方法,FreeGaze可以在系统校准和注视估计中分别实现高达6.81和1.67倍的速度,以实现可比较的凝视估计精度。
translated by 谷歌翻译
我们提出了一种惩罚的非参数方法,以使用整流器二次单元(REEND)激活了深层神经网络,以估计不可分割的模型中的分位数回归过程(QRP),并引入了新的惩罚函数,以实施对瓦解回归曲线的非交叉。我们为估计的QRP建立了非反应过量的风险界限,并在轻度平滑度和规律性条件下得出估计的QRP的平均综合平方误差。为了建立这些非反应风险和估计误差范围,我们还使用$ s> 0 $及其衍生物及其衍生物使用所需的激活的神经网络开发了一个新的错误,用于近似$ c^s $平滑功能。这是必需网络的新近似结果,并且具有独立的兴趣,并且可能在其他问题中有用。我们的数值实验表明,所提出的方法具有竞争性或胜过两种现有方法,包括使用再现核和随机森林的方法,用于非参数分位数回归。
translated by 谷歌翻译
生成的对抗网络(GANS)已被证明在图像生成任务中非常成功,但GaN培训具有不稳定问题。许多作品通过手动修改GaN架构提高了GaN训练的稳定性,这需要人类专业知识和广泛的试验和错误。因此,目的是自动化模型设计的神经结构搜索(NAS)已经应用于在无条件图像生成的任务上搜索GAN。早期的NAS-GaN仅用于搜索生成器来减少困难。最近的一些作品试图搜索发电机(G)和鉴别器(D)来提高GaN性能,但它们仍然遭受搜索过程中GaN培训的不稳定性。为了缓解不稳定问题,我们提出了一种高效的两阶段进化算法(EA)基于NAS框架来发现GANS,Dubbed \ TextBF {eagan}。具体而言,我们将G和D的搜索分成两个阶段,提出了重量重置策略以提高GaN训练的稳定性。此外,我们执行进展操作以基于多个目标生成帕累托 - 前部架构,导致G和D的优越组合。通过利用重量分享策略和低保真评估,EAGAN可以显着缩短搜索时间。 EAGAN在CIFAR-10上实现了高竞争力的结果(= 8.81 $ \ PM $ 0.10,FID = 9.91),并超越了STL-10数据集上的先前NAS搜索的GAN(= 10.44 $ \ PM $ 0.087,FID = 22.18)。
translated by 谷歌翻译
COVID-19大流行威胁着全球健康。许多研究应用了深度卷积神经网络(CNN),以识别基于胸部3D计算机断层扫描(CT)的COVID-19。最近的作品表明,没有模型在不同国家 /地区的CT数据集中概括得很好,并且为特定数据集设计模型需要专业知识。因此,旨在自动搜索模型的神经体系结构搜索(NAS)已成为一个有吸引力的解决方案。为了降低大型3D CT数据集的搜索成本,大多数基于NAS的作品都使用权重共享(WS)策略来使所有型号在超级网中共享权重。但是,WS不可避免地会导致搜索不稳定性,从而导致模型估计不准确。在这项工作中,我们提出了一个有效的进化多目标架构搜索(EMARS)框架。我们提出了一个新的目标,即潜在的潜力,可以帮助利用有前途的模型间接减少权重训练中涉及的模型数量,从而减轻搜索不稳定性。我们证明,在准确性和潜力的目标下,EMAR可以平衡剥削和探索,即减少搜索时间并找到更好的模型。我们的搜索模型很小,并且比在三个公共Covid-19 3D CT数据集上的先前工作表现更好。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译